feat(runners): add mi325x-vultr launch script by Oseltamivir · Pull Request #1738 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-13T01:45:54Z

Add runners/launch_mi325x-vultr.sh for the vultr mi325x fleet. Modeled on launch_mi325x-amds.sh (same SKU, same compute partition, same single-node salloc/import/srun flow and *_mi325x.sh bench invocation), with the two cluster-specific paths:

enroot cache (import layer cache + imported .sqsh) at /enroot/sa
pre-staged model weights / HF hub cache at /nfsdata/sa/models/, bind-mounted over the container HF_HUB_CACHE so hf download "$MODEL" reuses the staged models--org--name caches instead of re-downloading from HF.

Both paths are node-local ext4 at the same path on every compute node; import and run share one Slurm job on a single node, so node-local storage suffices.

Note

Low Risk
Changes are additive benchmark/CI infrastructure (configs, launcher, shell recipe) with no production auth or data-path logic; main risk is long, resource-heavy CI sweeps on new hardware.

Overview
Adds day-zero MiniMax-M3 MXFP8 single-node vLLM benchmarking on the Vultr MI325X fleet, alongside infrastructure to run it in CI.

A new mi325x-vultr runner pool (six GitHub runners) is wired to launch_mi325x-vultr.sh, which follows the existing MI325X Slurm/enroot flow but uses Vultr-specific enroot cache (/enroot/sa), staged HF hub cache bind-mount (/nfsdata/sa/models/), and Slurm node excludes for known-bad hosts.

minimaxm3-fp8-mi325x-vllm in amd-master.yaml registers MiniMaxAI/MiniMax-M3-MXFP8 on vllm/vllm-openai-rocm:minimax-m3 with fixed-seq-len sweeps (1k1k / 8k1k) over TP4/TP8, TEP (EP4/EP8), and DEP—TP2 is omitted vs B300 because ~444 GB MXFP8 would OOM on 256 GB GPUs.

minimaxm3_fp8_mi325x.sh implements the ROCm recipe: mandatory --block-size 128, TRITON_ATTN, --language-model-only, conc-scaled CUDA graphs, extended engine ready timeout, and standard MI325X ROCm env (AITER, HIP/Ray). perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 0bd8981. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add runners/launch_mi325x-vultr.sh for the vultr mi325x fleet. Modeled on launch_mi325x-amds.sh (same SKU, same compute partition, same single-node salloc/import/srun flow and *_mi325x.sh bench invocation), with the two cluster-specific paths: - enroot cache (import layer cache + imported .sqsh) at /enroot/sa - pre-staged model weights / HF hub cache at /nfsdata/sa/models/, bind-mounted over the container HF_HUB_CACHE so `hf download "$MODEL"` reuses the staged models--org--name caches instead of re-downloading from HF. Both paths are node-local ext4 at the same path on every compute node; import and run share one Slurm job on a single node, so node-local storage suffices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-13T02:21:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453624071
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27453624071

github-actions · 2026-06-13T02:22:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453730455
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27453730455

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 441ba6d. Configure here.}

cursor · 2026-06-13T02:22:37Z

+  image: vllm/vllm-openai-rocm:minimax-m3
+  model: MiniMaxAI/MiniMax-M3-MXFP8
+  model-prefix: minimaxm3
+  runner: mi325x


Wrong runner type in config

High Severity

The new Vultr MiniMax-M3 entry sets runner to mi325x, so CI schedules the AMDS fleet and launch_mi325x-amds.sh instead of mi325x-vultr and launch_mi325x-vultr.sh. Staged weights at /nfsdata/sa/models/ and enroot cache at /enroot/sa are never used for this config.

^{Reviewed by Cursor Bugbot for commit 441ba6d. Configure here.}

github-actions · 2026-06-13T02:37:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453730455
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27453730455

github-actions · 2026-06-13T02:39:05Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27454108525
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27454108525

github-actions · 2026-06-13T03:07:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27454108525
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27454108525

Node chi-mi325x-pod1-027 fails SLURM resume/boot — salloc grants an allocation then relinquishes it with "Something is wrong with the boot of the nodes" (run 27454108525), gating the minimaxm3-fp8-mi325x canary and thus the whole sweep. Add it to the --exclude list alongside the existing pod1-121 exclusion until the node is repaired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-13T03:30:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27455128357
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27455128357

Oseltamivir requested a review from a team June 13, 2026 01:45

github-project-automation Bot added this to InferenceMAX Board Jun 13, 2026

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread runners/launch_mi325x-vultr.sh

M3

e9a0c41

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 13, 2026 02:15

Oseltamivir added the full-sweep-enabled label Jun 13, 2026

Oseltamivir added 2 commits June 12, 2026 19:16

Merge branch 'main' into add-mi325x-vultr-runner

26889b5

Update amd-master.yaml

441ba6d

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Merge branch 'main' into add-mi325x-vultr-runner

b8c64ec

cquil11 closed this Jun 13, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runners): add mi325x-vultr launch script#1738

feat(runners): add mi325x-vultr launch script#1738
Oseltamivir wants to merge 6 commits into
mainfrom
add-mi325x-vultr-runner

Oseltamivir commented Jun 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented Jun 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Wrong runner type in config

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented Jun 13, 2026 •

edited by cursor Bot

Loading